This code through explores the importance of sentiment analysis which is a technique to evaluate the overall positivity, negativity, or neutrality of textual data. It mainly uses machine learning and natural language processing (NLP) to classify and interpret emotions. This is often used in businesses to detect and collect customer feedback.
How does sentiment analysis work?
After applying text mining or text analytics, which is the process of converting unstructured text data into meaningful information, you will get a a list of terms that can be cross intersected with a lexicon. You are mainly comparing the terms to emotion lexicons and returning scores that represent emotional feedback.
Specifically, I will explain and demonstrate the usage of Syuzhet package and I will focus on two major functions get_sentiment & get_nrc_sentiment.
The russian formalists Victor Shklovsky and Vladimir Propp divided narrative into two categories, the “fabula” and the “syuzhet”. Syuzhet is referred to as the technique while fabula represents the chronological order of events. Thus, Syuzhet is concerned in the manner in which elements of the story (fabula) are organized (syuzhet).
This package reveals the latent structure of narrative by means of sentiment analysis. It implements Saif Mohammad’s NRC Emotion lexicon (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust).These terms are distributed into 3 polarities: positive, neutral, and negative.
This package is now available on CRAN (http://cran.r-project.org/web/packages/syuzhet/).
In the world of sentiment analysis, there are other packages that care about making sense out of unstructured texts. These include tidytext and sentimentr. Syuzhet and Tidytext share the same limitation since they both use the “Bing,” “AFINN,” and “NRC” lexicons. The Bing and AFINN lexicons perceive the word “miss” as a negative word, while NRC doesn’t. However, *sentimentr addresses this issue by considering domain specific lexicons and by taking into consideration valence shifters (such as “not”, “very”, or “doesn’t”).
Description
Iterates over a vector of strings and returns sentiment values based on the chosen method.
Syntax:
char_v <- c("cry","hug", "hurt")
get_sentiment(char_v, method = "syuzhet",language = "english", lexicon = NULL)## [1] -0.75 0.75 -0.75
Arguments
char_v: vector of strings
method: A string indicating which sentiment method to use. Options include “syuzhet”, “bing”, “afinn”, “nrc” and “stanford.” The default method is syuzhet.
language: A string, and only works for nrc method.
lexicon: a data frame with at least two columns labeled “word” and “value.”
Return Value: Return value is a numeric vector of sentiment values, one value for each input sentence.
Description
Calls the NRC sentiment dictionary to calculate the presence of eight different emotions and their corresponding valence in a text file.
Syntax:
Arguments:
char_v: A character vector
language: A string
Return Value: A data frame where each row represents a sentence from the original file. The columns include one for each emotion type as well as a positive or negative valence. The ten columns are as follows: “anger”, “anticipation”, “disgust”, “fear”, “joy”, “sadness”, “surprise”, “trust”, “negative”, “positive.”
I will use a dataset named “Brands and Product Emotions” from data.world. The tweets about multiple brands and products were evaluated.
URL <- "https://query.data.world/s/6fd24pmaed5nmmjlx5nfexwfntar3f"
dat <- read.csv(URL)
dat <- read.csv ("https://query.data.world/s/6fd24pmaed5nmmjlx5nfexwfntar3f", header=TRUE, stringsAsFactors = FALSE)## [1] ".@wesley83 I have a 3G iPhone."
## [2] "After 3 hrs tweeting at #RISE_Austin, it was dead!"
## [3] "I need to upgrade."
## [4] "Plugin stations at #SXSW."
## [5] "@jessedee Know about @fludapp ?"
## [6] "Awesome iPad/iPhone app that you'll likely appreciate for its design."
## [1] 0.0 -1.0 0.0 0.0 0.0 1.1
## [1] 0 0 0 0 0 0
Remark: The different methods will return slightly different results since each method uses a different scale.
## [1] 4025.35
## [1] "VatorNews - Google And Apple Force Print Media to Evolve?"
## [2] "Khoi Vinh (@mention says Conde Nast's headlong rush into iPad publishing was a "fundamental misunderstanding" of the platform #sxsw"
## [3] "{link} RT @mention 1st stop on the #SXSW #Chaos & @mention hunt: Austin Java."
## [4] "RT @mention Line at the Apple store is insane.."
## [5] "Somebody kidnap her and put her in a recording studio until she records a new album."
## [6] "RT @mention giving added value to location based services needs to battle check-in fatigue #google #pnid #sxsw"
## [1] "@sxsw I hope this year's festival isn't as crashy as this year's iPhone app."
## [2] "#SXSW is just starting, #CTIA is around the corner and #googleio is only a hop skip and a jump from there, good time to be an #android fan"
## [3] "Excited to meet the @samsungmobileus at #sxsw so I can show them my Sprint Galaxy S still running Android 2.1."
## [4] "Gotta love this #SXSW Google Calendar featuring top parties/ show cases to check out."
## [5] "RT @malbonster: Lovely review from Forbes for our SXSW iPad app Holler Gram - http://t.co/g4GZypV"
## [6] "God."
##
## --------------------------------------------------------------------------------------
## anger anticipation disgust fear joy sadness surprise trust
## ----------- ------- -------------- --------- ------ ----- --------- ---------- -------
## **17393** 0 0 0 0 0 0 0 1
##
## **17394** 0 0 0 0 0 0 0 0
##
## **17395** 0 0 0 0 0 0 0 0
##
## **17396** 0 1 0 0 0 1 0 0
##
## **17397** 0 0 0 0 0 0 0 0
##
## **17398** 0 0 0 0 0 0 0 0
## --------------------------------------------------------------------------------------
##
## ---------------------------------
## negative positive
## ----------- ---------- ----------
## **17393** 0 1
##
## **17394** 0 0
##
## **17395** 0 0
##
## **17396** 1 0
##
## **17397** 0 0
##
## **17398** 0 0
## ---------------------------------
barplot(
sort(colSums(prop.table(nrc_data[, 1:8]))),
horiz = TRUE,
cex.names = 0.7,
las = 1,
main = "Emotions in Sample text", xlab="Percentage"
)mydataCopy <- dat
#carryout sentiment mining using the get_nrc_sentiment()function #log the findings under a variable result
result <- get_nrc_sentiment(as.character(mydataCopy))
#change result from a list to a data frame and transpose it
result1<-data.frame(t(result))
#rowSums computes column sums across rows for each level of a #grouping variable.
new_result <- data.frame(rowSums(result1))
#name rows and columns of the dataframe
names(new_result)[1] <- "count"
new_result <- cbind("sentiment" = rownames(new_result), new_result)
rownames(new_result) <- NULL
#plot the first 8 rows,the distinct emotions
qplot(sentiment, data=new_result[1:8,], weight=count, geom="bar",fill=sentiment)+ggtitle("Products Sentiments")#plot the last 2 rows ,positive and negative
qplot(sentiment, data=new_result[9:10,], weight=count, geom="bar",fill=sentiment)+ggtitle("Products Sentiments")
Learn more about [sentiment analysis & syuzhet package] with the following:
Resource I Text Mining
Resource II Sentiment Analysis
Resource III Basic Sentiment Analysis
This code through references and cites the following sources:
Lorna (2018). Source I. Exploring Sentiment Analysis
Matthew (2017). Source II. Syuzhet Documentation
Hoyeol (2018). Source III. Tidytext Package
Richard (2019). Source IV. Introduction to the Syuzhet Package